

Type 


Hits 


Search Text 


1 


BRS 


128 


extract$3 with conunon with (metadata 
or attribut$2) 


2 


BRS 


38 


SI and "707"/$.ccls. 


3 


BRS 


23 


(extract$3 with common with (metadata 
or attribut$2)) same ( (generat$3 or 
creat$3) with (metadat or attribut$2) ) 


4 


BRS 


0 


S3 and ( {attach$3 or append$3) with 
(director$3 or folder$l) ) 


5 


BRS 


50 


(extract$3 with common with (metadata 
or attribut$2)) and ( (generat$3 or 
creatS3) with (metadat or attributfip^ ^ 


6 


BRS 


0 


S5 and ( (attach$3 or append$3) with 
(director$3 or folderSl) ) 


7 


BRS 


2 


"6009439" .pn. 


8 


BRS 


0 


{extract$3 with (common with 
attribut$2)) and (append$3 with 
director$3) 


9 


BRS 


9 


(extract$3 with (meta-data or 
attribut$2)) and (append$3 with 
director$3) 


10 


BRS 


2 


{extract$3 with (meta-data or 
attribut$2)) same ( (append$3 or 
attach$3) with director$3) 


11 


BRS 


11 


Sll and "707"/$.ccls. 


12 


BRS 


36 


(extract$3 with (meta-data or 
attribut$2)) and ( (append$3 or 
attach$3) with director$3) 


13 


BRS 


431 


(extract$3 with content$l) same 
metadata 


14 


BRS 


15 


(extract$3 with content$l) same 
metadata same (directory$3 or 
folder$l) 
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Type 


Hits 


Search Text 


15 


BRS 


24 


(extract$3 with (keyword$l or name$l 
or hyperlink$l or content$l) ) same 
metadata same (directory$3 or 
folder$l) 


16 


BRS 


0 


or hyperlink$l or content$l) ) same 
metadata same (directory$3 or 
folderSl) same (attach$3 or aDDendS3) 


17 


BRS 


27 


(extract $3 with (keyword$l or name$l 
or hyperlink$l or content$l) ) same 
metadata same (attach$3 or append$3). 


18 


BRS 


191 


(extract$3 with {keyword$l or name$l 
or hyperlink$l or content$l) with 
(file$l or folder$l)) same (attach$3 
or append$3) 


19 


BRS 


193 


(extract$3 with (keyword$l or name$l 
or hyperlink$l or content$l) with 
(file$l or folder$l or director$3)) 
same (attach$3 or append$3) 


20 


BRS 


2 


(extract$3 with (keyword$l or name$l 
or hyperlink$l or content$l) with 
(file$l or folder$l or director$3)) 
same (attach$3 or append$3) same 
metadata 


21 


BRS 


2 


(extract$3 with (keyword$l or name$l 
or hyperlink$l or content$l) with 
(document$l or file$l or folder$l or 
director$3)) same (attach$3 or 
append$3) same metadata 


22 


BRS 


101 


( (attach$3 or append$3) with 
(extract$3 with (keyword$l or name$l 
or hyperlink$l or metadata) with 
(document$l or file$l or folder$l or 
director$3) ) ) 


23 


BRS 


17 


S22 and "707"/$.ccls. 


24 


BRS 


16 


( (attach$3 or append$3) with 
(extract$3 with (keyword$l or name$l 
or id or identifier$l or hyperlink$l 
or metadata) ) ) same (director$3 or 
folder$l) 
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Type 


Hits 


Search Text 


25 


BRS 


188 


(mp3 or MP3) and ID3 


26 


BRS 


18 


S25 and (®ad< " 19990427 " or 
@rlad< "19990427") 


27 


BRS 


11 


S25 and (®ad<" 19990427" or 

®rlad< "19990427") and (classif$5 or 

group$3 or sort $3) 


28 


BRS 


6 


S28 and "707"/$.ccls. 


29 


BRS 


30 


S22 and (®ad<" 19990427" or 
®rlad< "19990427") 


30 


BRS 


7 


( {attach$3 or append$3) with 
(extract$3 with (keyword$l or name$l 
or hyperlink$l or metadata) with 
content with (document$l or file$l or 
folder$l or director$3) ) ) 


31 


BRS 


92 


\ vy "-tJ-cx 9 J tJX C J_ ccxC 9 J y WlCii ^ COTuTTiOn 

near2 (metadata or keyword$l or name$l 
or attribute$l) ) ) same (folder$l or 
director$3 or storage$l) 


32 


BRS 


79 


S31 and (group$4 or classif$5 or 
sort$3 ) 


33 


BRS 


24 


S32 and "707"/$.ccls. 


34 


BRS 


3 


S33 and @rlad< " 19990427 " 


35 


BRS 


4 


S32 and @rlad< " 19990427 " 


36 


BRS 


457 


( (group$4 or classif$5 or sort$3) with 
(metadata or attribute$l or name$l or 
keyword$l or (file near2 
extension$l) ) ) same ( (generat$3 or 
creat$3) with (director$3 or 
folder$l) ) 
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Type 


Hits 


Search Text 


37 


BRS 


15 


( (group$4 or classif$5 or sort$3) with 
(cominon$5 or similar$5) with (metadata 

ojt a-uuiiDUuspi Oi naiuepx ojt Keywo^TQpi 
or (file near2 extension$l) ) ) same 
( (generat$3 or creat$3) with 
(directorlS3 or folderSl))' 


38 


BRS 


2 


S37 and @rlad< " 19990427 " 


39 


BRS 


112 


( (generat$3 or creat$3) with 
(director$3 or folder$l) ) same 
(common$4 or similar$5) near2 (name$l 
or attribute$l or metadata or 
keyword$l or ( (file or folder or 
director$3) near extension$l) ) 


40 


BRS 


112 


( (generat$3 or creat$3) with 
(director$3 or folder$l) ) same 
( (common$4 or similar$5) near2 (name$l 
or attribute$l or metadata or 
keyword$l or ((file or folder or 
director$3) near extension$l) ) ) 


41 


BRS 


22 


S40 and @rlad<" 19990427" 


42 


BRS 


2 


vexuracupo wiun common wicn vmecaQaca 
or name$l or id or identif ier$l) ) same 
( (aeneratS3 or* prpatSl^ with 
(director$3 or folder$l or categor$3) ) 


43 


BRS 


2 


S42 and 707/1, 10, 101, 104 . 1 .ccls. 


44 


BRS 


2 


S42 and 

707/1, 10, 101, 104 .1, 103y, 102z.ccls. 


45 


BRS 


2 


\exuracu90 wlun common wiun vmeuaaaca 
or name$l or id or identif ier$l) ) same 
( (generat$3 or creat$3) with 
(director$3 or folder$l or categor$3) ) 


46 


BRS 


2 


S45 and 

707/1, 10, 101, 104. 1, 103y, 103z.ccls. 
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^ ABSTRACT 



Software repositories have been getting a lot of attention from researchers in recent years. In order 
to analyze software repositories, it is necessary to first extract raw data from the version control and 
problem tracking systems. This poses two challenges: (1) extraction requires a non-trivial effort, and 
(2) the results depend on the heuristics used during extraction. These challenges burden researchers 
that are new to the community and make it difficult to benchmark software repository mining since it 
is almost impossible to reproduce experiments done by another team. In this paper we present the 
TA-RE corpus. TA-RE collects extracted data from software repositories in order to build a collection 
of projects that will simplify extraction process. Additionally the collection can be used for 
benchmarking. As the first step we propose an exchange language capable of making sharing and 
reusing data as simple as possible. 
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